Skip to content

[VL] Make VeloxResizeBatchesExec inherit from ColumnarToColumnarExec to simplify the code#10763

Merged
zhztheplayer merged 2 commits intoapache:mainfrom
Zouxxyy:dev/resize_batch
Sep 30, 2025
Merged

[VL] Make VeloxResizeBatchesExec inherit from ColumnarToColumnarExec to simplify the code#10763
zhztheplayer merged 2 commits intoapache:mainfrom
Zouxxyy:dev/resize_batch

Conversation

@Zouxxyy
Copy link
Contributor

@Zouxxyy Zouxxyy commented Sep 21, 2025

What changes are proposed in this pull request?

  • Make VeloxResizeBatchesExec inherit from ColumnarToColumnarExec to simplify the code
  • Convert ClosableIterator to be generic.
  • Add isSameConvention in ColumnarToColumnarTransition

How was this patch tested?

@github-actions github-actions bot added the VELOX label Sep 21, 2025
@Zouxxyy Zouxxyy marked this pull request as draft September 21, 2025 10:08
@github-actions github-actions bot added the CORE works for Gluten Core label Sep 21, 2025
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@github-actions
Copy link

Run Gluten Clickhouse CI on x86

@Zouxxyy Zouxxyy marked this pull request as ready for review September 21, 2025 17:22
@Zouxxyy
Copy link
Contributor Author

Zouxxyy commented Sep 22, 2025

CC @zhztheplayer Thanks

Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

Would you like to help check the UI sanity, especially the metrics? Could post a screenshot here if possible? Thanks!

@Zouxxyy
Copy link
Contributor Author

Zouxxyy commented Sep 22, 2025

@zhztheplayer Sure, here is SELECT max(l_orderkey) FROM lineitem

one metric description changed
"time to append / split batches" -> "time to convert batches"

image

Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for the screenshot.

def unapply(plan: SparkPlan): Option[SparkPlan] = {
plan match {
case c2c: ColumnarToColumnarTransition =>
case c2c: ColumnarToColumnarTransition if !c2c.isSameConvention =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zouxxyy

Sorry I missed out this change. Would you explain the purpose of it? Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColumnarToColumnarLike is used by the Coster to calculate C2C costs. Previously, VeloxResizeBatchesExec didn't inherit from ColumnarToColumnarTransition, so it wasn't included in the cost calculation. This change is to maintain the previous behavior.

Besides, in my testing, without this, ELECT max(l_orderkey) FROM lineitem will not generate a VeloxResizeBatchesExec node because it is eliminated after the cost calculation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently VeloxResizeBatchesExec is added by rule https://github.com/apache/incubator-gluten/blob/cd2c0cca9b9478a050bfbb90f15e75f99e7adcd2/backends-velox/src/main/scala/org/apache/gluten/extension/AppendBatchResizeForShuffleInputAndOutput.scala#L31-L59 without costers involved. Am I missing something? How is it eliminated after being added by the rule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhztheplayer , Not sure if my understanding is correct, in InsertTransitions we removeForNode ColumnarToColumnarLike, and then fillWithTransitions

case class InsertTransitions(convReq: ConventionReq) extends Rule[SparkPlan] {
  private val convFunc = ConventionFunc.create()

  override def apply(plan: SparkPlan): SparkPlan = {
    // Remove all transitions at first.
    val removed = RemoveTransitions.apply(plan)
    val filled = fillWithTransitions(removed)
    val out = Transitions.enforceReq(filled, convReq)
    out
  }

but in VeloxBatchType we have no the Transitions, so after we remove node, we loss VeloxResizeBatchesExec

object VeloxBatchType extends Convention.BatchType {
  override protected def registerTransitions(): Unit = {
    fromRow(Convention.RowType.VanillaRowType, RowToVeloxColumnarExec.apply)
    toRow(Convention.RowType.VanillaRowType, VeloxColumnarToRowExec.apply)
    fromBatch(ArrowBatchTypes.ArrowNativeBatchType, ArrowColumnarToVeloxColumnarExec.apply)
    toBatch(ArrowBatchTypes.ArrowNativeBatchType, Transition.empty)
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the current c2c is automatically added, and resize is manually added, so we should not remove it. and whether velox resize should also be included in the cost

@zhztheplayer zhztheplayer merged commit 5a1664c into apache:main Sep 30, 2025
57 checks passed
@zhztheplayer
Copy link
Member

Merging, thank you @Zouxxyy. I felt there is opportunity to refactor the code to make it less impact the transition planner, but haven't got chance to experiment. I will revisit later.

zhztheplayer added a commit to zhztheplayer/gluten that referenced this pull request Nov 12, 2025
…tion

It's an improvement following apache#10763. The PR decouples ColumnarToColumnarExec from ColumnarToColumnarTransition to make them both can be solely implemented. If a class inherits from ColumnarToColumnarExec but not from ColumnarToColumnarTransition at the same time, it will not be considered as a transition, so it's not going to be removed by RemoveTransitions rule.
zhztheplayer added a commit to zhztheplayer/gluten that referenced this pull request Nov 12, 2025
…tion

It's an improvement following apache#10763. The PR decouples ColumnarToColumnarExec from ColumnarToColumnarTransition to make them both can be solely implemented. If a class inherits from ColumnarToColumnarExec but not from ColumnarToColumnarTransition at the same time, it will not be considered as a transition, so it's not going to be removed by RemoveTransitions rule.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLICKHOUSE CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants